Fundamentally, they are based on gathering an extraordinary amount of linguistic data (much of it codified on the internet), finding correlations between words (more accurately, sub-words called "tokens"), and then predicting what output should follow given a particular prompt as input. For all the alleged complexity of generative AI, at their core they really are models of language.
As a journalist who covers AI, I hear from countless people who seem utterly convinced that ChatGPT, Claude, or some other chatbot has achieved "sentience." Or "consciousness." Or-my personal favorite-"a mind of its own." The Turing test was aced a while back, yes, but unlike rote intelligence, these things are not so easily pinned down. Large language models will claim to think for themselves, even describe inner torments or profess undying loves, but such statements don't imply interiority.
Games provide a clear, unambiguous signal of success. Their structured nature and measurable outcomes make them the perfect testbed for evaluating models and agents. They force models to demonstrate many skills including strategic reasoning, long-term planning, and dynamic adaptation against an intelligent opponent, providing a robust signal of their general problem-solving intelligence.